Increasing Accuracy While Maintaining Minimal Grammars in Cky Parsing
نویسندگان
چکیده
Significant work in both lexicalized and unlexicalized parsing has been done in the past ten years. F1 measures of accuracy of over 90% have been achieved (Bikel, 2005), and linguistic notions of lexical dependencies and using head words have been harnessed to create significant improvements in probabilistic CFG note, however, that many of the techniques for improving lexicalized parsing create relatively little gain while creating more complex algorithms. While Collins (1999) parser is extremely useful is F1 is of the utmost importance, Klein and Manning's (2003) parser achieves F1 within 5% of that parser without invoking lexicalization. For our project, we attempted to increase the precision and recall (F1) of the CKY parser built for project four. Our goal was to significantly improve F1 while keep the parser unlexicalized and maintaining a relatively small number of non-terminals. Although such a parser might not be as accurate as lexicalized models or unlexicalized models with larger grammars, we wished to show that acceptable F1 scores can be attained using minimal grammars and no lexicalization; we hypothesized that an F1 score within 1% of the best unlexicalized parser we found in the literature (Klein and Manning, 2003), with an F1 of 86.36%, was achievable. The minimal grammar allows for fast parsing; given a fully optimized parser, this grammar might be used for extremely quick trials that approximate the results of larger grammars that produce more accurate parsing or for pre-processing purposes. In increasing the F1 of our parser, we modeled our changes closely on those described in " Accurate Unlexicalized Parsing " (2003) by Dan Klein and Chris Manning. This paper provided the clearest suggestions for improving unlexicalized parsing and allowed us to explore linguistic patterns that are useful in parsing. All of our improvements were made through annotating the grammar in various ways to reflect external areas of the parse tree that might affect the current non-terminal's behavior and internal properties of this particular non-terminal or the structure below it. Our baseline parser was that created for project four. This parser included second-order vertical Markovization and first order horizontal Markovization. Additionally, it annotated preterminal nodes with the tag of their parents, just as vertical Markovization annotates other nodes with their parent tag. This parser produced a baseline F1 of 81.92%. Given previous results suggesting that increasing the order of vertical Markovization vastly increases the number of tags and is difficult to further …
منابع مشابه
Iterative CKY Parsing for Probabilistic Context-Free Grammars
This paper presents an iterative CKY parsing algorithm for probabilistic contextfree grammars (PCFG). This algorithm enables us to prune unnecessary edges produced during parsing, which results in more efficient parsing. Since pruning is done by using the edge’s inside Viterbi probability and the upper-bound of the outside Viterbi probability, this algorithm guarantees to output the exact Viter...
متن کاملParallelizing the CKY and Earley Parsing Algorithms
Context-free parsing algorithms are one of the oldest and most well-understood aspects of natural language processing. Efforts to reduce the time complexity of these algorithms have produced two particularly popular algorithms: the Cocke-Kasami-Younger (CKY) bottomup parsing algorithm [5, 9], and the Earley top-down parsing algorithm [2, 3]. However, despite these efforts, parsing remains a tim...
متن کاملA polynomial parsing algorithm for the topological model Synchronizing Constituent and Dependency Grammars, Illustrated by German Word Order Phenomena
This paper describes a minimal topology driven parsing algorithm for topological grammars that synchronizes a rewriting grammar and a dependency grammar, obtaining two linguistically motivated syntactic structures. The use of non-local slash and visitor features can be restricted to obtain a CKY type analysis in polynomial time. German long distance phenomena illustrate the algorithm, bringing ...
متن کاملEfficient Implementation of the Cky Algorithm
When the CKY algorithm is presented in Natural Language Processing literature, it is often is described in high-level pseudo code. The implementation details of the CKY algorithm, despite being critical to efficiency, are rarely (if ever) discussed. In this paper I discuss multiple implementation approaches, and optimizations on these approaches to increase parsing time an order of magnitude wh...
متن کاملA CKY parser for picture grammars
We study the complexity of the membership or parsing problem for pictures generated by a family of picture grammars: Siromoney’s Context-Free Kolam Array grammars (coincident with Matz’s context-free picture grammars). We describe a new parsing algorithm, which extends the Cocke, Kasami and Younger’s classical parsing technique for string languages and preserves the polynomial time complexity.
متن کامل